-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky test execution caused by Thread
#34966
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
fix Co-authored-by: ydshieh <[email protected]>
Hi @ydshieh - FYI this change appears to have broken some unit tests in DeepSpeed, specifically where we download a model. I'll look at ways to resolve this, I assume it is an issue in multiprocessing/threading. |
@loadams Thank you for informing us. Hope there is a way that could work in both cases 🙏 Let me know if there is anything we can help here. |
fix Co-authored-by: ydshieh <[email protected]>
@ydshieh I'm also experiencing issues here for Sentence Transformers/Cross Encoder models, see UKPLab/sentence-transformers#3129 In short: If one of my users loads any model that only has a from sentence_transformers import SentenceTransformer
model = SentenceTransformer("embaas/sentence-transformers-gte-base") or from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2") which internally call from transformers import AutoModel
model = AutoModel.from_pretrained("embaas/sentence-transformers-gte-base") or from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2") All of these get:
|
Hi @loadams FYI: it's reverted back to use |
@ydshieh - thanks for the update! |
…ues in tests (#6822) Changes from huggingface/transformers#34966 caused the `nv-torch-latest-v100` tests to fail with the following error: ``` File "/tmp/azureml/cr/j/e4bfd57a509846d6bbc4914639ad248d/exe/wd/actions-runner/_work/DeepSpeed/DeepSpeed/unit-test-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3941, in from_pretrained raise EnvironmentError( OSError: Can't load the model for 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. ``` Sample failure here: https://github.com/microsoft/DeepSpeed/actions/runs/12169422174/job/33942348835?pr=6794#step:8:3506 This was resolved on the Transformers side here: huggingface/transformers#35236
What does this PR do?
As discussed offline:
To check the PR works (make sure tensorflow is available):
where
script.py